Large Alphabet Coding and Prediction through Poissonization and Tilting

نویسندگان

  • Xiao Yang
  • Andrew R. Barron
چکیده

This paper introduces a convenient strategy for compression and prediction of sequences of independent, identically distributed random variables generated from a large alphabet of size m. In particular, the size of the sample is allowed to be variable. The employment of a Poisson model and tilting method simplifies the implementation and analysis through independence. The resulting strategy is optimal within the class of distributions satisfying a moment condition, and is close to optimal for a smaller class – the class of distributions with an analogous condition on the counts. Moreover, the method can be used to code and predict sequences in a subset with the tail counts satisfying a given condition, and it can also be applied to envelope classes.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Large Alphabet Compression and Predictive Distributions through Poissonization and Tilting

This paper introduces a convenient strategy for coding and predicting sequences of independent, identically distributed random variables generated from a large alphabet of size m. In particular, the size of the sample is allowed to be variable. The employment of a Poisson model and tilting method simplifies the implementation and analysis through independence. The resulting strategy is optimal ...

متن کامل

A large-alphabet-oriented scheme for Chinese and English text compression

In this paper, a large alphabet oriented scheme is proposed for both Chinese and English text compression. Our scheme parses Chinese text with the alphabet defined by Big-5 code, and parses English text with some rules designed here. Thus, the alphabet used for English is not a word alphabet. After parsed out into tokens, zero, first, and second order Markov models are used to estimate the occu...

متن کامل

A pr 2 00 5 Prediction of Large Alphabet Processes and Its Application to Adaptive Source Coding ∗

The problem of predicting a sequence x1, x2, · · · generated by a discrete source with unknown statistics is considered. Each letter xt+1 is predicted using information on the word x1x2 · · · xt only. In fact, this problem is a classical problem which has received much attention. Its history can be traced back to Laplace. We address the problem where each xi belongs to some large (or even infin...

متن کامل

Prediction of Large Alphabet Processes and Its Application to Adaptive Source Coding

The problem of predicting a sequence x1, x2, · · · generated by a discrete source with unknown statistics is considered. Each letter xt+1 is predicted using the information on the word x1x2 · · · xt only. This problem is of great importance for data compression, because of its use to estimate probability distributions for PPM algorithms and other adaptive codes. On the other hand, such predicti...

متن کامل

Alphabet Partitioning Techniques for Semi-Adaptive Huffman Coding of Large Alphabets Alphabet Partitioning Techniques for Semi-Adaptive Huffman Coding of Large Alphabets∗

Practical applications that employ entropy coding for large alphabets often partition the alphabet set into two or more layers and encode each symbol by using some suitable prefix coding for each layer. In this paper, we formulate the problem of finding an alphabet partitioning for the design of a two-layer semi-adaptive code as an optimization problem, and give a solution based on dynamic prog...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014